We will use sna (Butts 2016) and network (Butts 2015)
require(sna)
## Loading required package: sna
## Loading required package: statnet.common
##
## Attaching package: 'statnet.common'
## The following objects are masked from 'package:base':
##
## attr, order
## Loading required package: network
##
## 'network' 1.18.2 (2023-12-04), part of the Statnet Project
## * 'news(package="network")' for changes since last version
## * 'citation("network")' for citation information
## * 'https://statnet.org' for help, support, and other information
## sna: Tools for Social Network Analysis
## Version 2.7-2 created on 2023-12-05.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
## For citation information, type citation("sna").
## Type help(package="sna") to get started.
require(network)
We are looking at the s50 dataset, which is further described here: https://www.stats.ox.ac.uk/~snijders/siena/s50_data.htm
This dataset is available in ziped format online.
temp <- tempfile()
download.file("https://www.stats.ox.ac.uk/~snijders/siena/s50_data.zip",temp)
X <- as.matrix( read.table(unz(temp, "s50-network1.dat")) )
sport <- read.table(unz(temp, "s50-sport.dat"))
smoke <- read.table(unz(temp, "s50-smoke.dat"))
alcohol <- read.table(unz(temp, "s50-alcohol.dat"))
unlink(temp)
n <- nrow(X)
smoke <- (smoke[,2] %in% c(2,3))+0 # use wave 2 and set values of 2 and 3 to smoking and 1 to non-smoking
sport <- sport[,1]-1# dichotomise sporty
There are a lot of network metrics implemented in `sna’ (check help file). We will only look at a selection.
Most important is to plot the network and get a feeling for it
gplot(X)
par(mfrow=c(1,2))# set up two panels
plot(table( degree( X , cmode = "indegree")),main='Indegrees')
plot(table( degree( X , cmode = "outdegree")),main='Outdegrees')
dyad.census(X)
## Mut Asym Null
## [1,] 39 35 1151
gtrans(X)
## [1] 0.3873874
triad.census(X)
## 003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U 120C 210
## [1,] 16243 1470 1724 5 18 21 42 30 5 0 15 6 5 2 9
## 300
## [1,] 5
The variable Alcohol use is coded as
| Alcohol value | meaning |
|---|---|
| 1 | non |
| 2 | once or twice a year |
| 3 | once a month |
| 4 | once a week |
| 5 | more than once a week |
We are now going to investigate if your friends alcohol use tends to influence your alcohol use. Start with plotting the networks with
gplot(X,vertex.cex=alcohol[,2]/2,
vertex.col = ifelse(smoke==1,'red','green') )
Looking at the figure, do you see any evidence of social influence?
Big nodes seem to be tied to other big nodes and small nodes to other small nodes. Smokers also seem to hang out with other smokers.
Make the assumption that the levels of smoking can be treated as a interval-level, continuous variable. To model the outcomes, we start with assuming that outcomes are independent across students using a regression model
\[ Y_i = \beta_0+\beta_1 m_{i,smoke} + \beta_2 m_{i,sport}+\epsilon_i \]
where
y <- alcohol[,2]
ans.ols <- lm(y ~ smoke+sport)
summary(ans.ols)
##
## Call:
## lm(formula = y ~ smoke + sport)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0922 -0.6918 0.1690 0.8232 2.5694
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.4306 0.3169 7.669 7.98e-10 ***
## smoke 1.4004 0.3084 4.541 3.90e-05 ***
## sport 0.2612 0.3331 0.784 0.437
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.021 on 47 degrees of freedom
## Multiple R-squared: 0.305, Adjusted R-squared: 0.2755
## F-statistic: 10.31 on 2 and 47 DF, p-value: 0.0001933
What conclusions can you draw from the ANOVA table of the OLS regression regarding the regressors smoke and sport?
Smokers have on average a 1.4 higher score on drinking and the coefficient is significantly different from 0 at the 1%-level (and more). There is no evidence of sporty people drinking more or less than less sporty people as the coefficient is not significantly different from 0.
We assumed that the errors were normally distributed so the residuals
\[ \hat{e}_i= y_i - \hat{y}= y_i - \hat{\beta}M_i^{\top} \]
should be normally distributed:
hist(ans.ols$residuals)
The residuals seem reasonably ‘normal’
We are now going to investigate other properties of the model.
For the sna package, we need to put all covariates,
including the constant, in a matrix
\[ \mathbf{M} = \begin{pmatrix} M_{1}\\ M_{2}\\ \vdots\\ M_{n}\\ \end{pmatrix}= \begin{pmatrix} 1 & m_{1,1} & \cdots & m_{Z,p}\\ 1 & m_{2,1} & \cdots & m_{Z,p}\\ \vdots & \vdots & \vdots& \vdots\\ 1 & m_{n,1} & \cdots & m_{n,p}\\ \end{pmatrix} \]
We thus put all covariates into the same matrix:
M <- cbind(matrix(1,n,1),smoke,sport)
colnames(M) <- c('cons','smoke','sport')
head(M)
## cons smoke sport
## [1,] 1 0 1
## [2,] 1 1 0
## [3,] 1 0 0
## [4,] 1 0 1
## [5,] 1 0 1
## [6,] 1 0 1
Let us investigate whether the residuals are independent across the network. In particular, if the outcome of \(i\) is completely independent of the outcome of \(j\) then we would not expect there to be a difference between \((\hat{e}_i - \bar{e})(\hat{e}_j-\bar{e})\) for a pair that is connected through a ties \(x_{ij}=1\) and a pair \((\hat{e}_i - \bar{e})(\hat{e}_k-\bar{e})\) that is not connected, \(x_{ik}=0\). In other words, higher (lower) than predicted values should not be expected if a network partner has higher (lower) than predicted value. Note that the average \(\bar{e}=0\) by construction and was just included for clarity. Moran’s I does measure exactly whether connected partners tend to have more similar residuals than unconnected people. In terms of the outcome variable
\[ I_k =\frac{n}{\sum_{i=1}^n\sum_{j=1}^n X_{ij}^{(k)}} \frac{\sum_{i=1}^n \sum_{j=1}^n (y_i-\bar{y}) (y_j-\bar{y})X_{ij}^{(k)} }{\sum_{j=1}^n (y_j-\bar{y})^2} \]
Where \(X_{ij}^{(k)}\) is the \(k\)-step adjacency matrix, i.e. \(X_{ij}^{(1)}\) is the matrix of people that are directly connected, \(X_{ij}^{(2)}\) is the matrix of people that are connected at distance 2, etc. The step \(k\) is sometimes called lag. Intuitively, if \(X_{ij}^{(k)}\) doesn’t matter, then many cross-product terms should cancel others out. It can be shown that, under the assumption that there is no network correlation at lag \(k=1\), then the expected value is
\[ E(I_1) = \frac{-1}{N-1} \]
Here we want to check if this seems to hold true for our residuals - are they uncorrelated across network partners?
nacf(X,ans.ols$residuals,type="moran")
## 0 1 2 3 4 5
## 1.00000000 0.19495562 -0.17048356 0.02784058 -0.04684902 -0.01678792
## 6 7 8 9 10 11
## 0.03002702 -0.06753150 -0.10455140 0.00000000 0.00000000 0.00000000
## 12 13 14 15 16 17
## 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## 18 19 20 21 22 23
## 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## 24 25 26 27 28 29
## 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## 30 31 32 33 34 35
## 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## 36 37 38 39 40 41
## 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## 42 43 44 45 46 47
## 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## 48 49
## 0.00000000 0.00000000
# nacf(X,ans.ols$residuals,type="geary") # Geary's C is another measure of autocorrelation but is harder to undertand
That \(I_0=1\) is because this is the correlation of residuals with themselves!
We are only really interested in shorter lags, so let us plot the first 4 and add a reference line for \(E(I_1)\) under no correlation
plot(nacf(X,ans.ols$residuals,type="moran")[1:4],type='b')
abline(h=-1/(n-1),col='red')
When modelling social influence we may want scale the network ties to take into account how many ties people have. Leenders (2002) (Leenders 2002) propose a number of ways in which we can scale the adjacency matrix. Here we create a weight matrix
\[ \mathbf{W} = \begin{bmatrix} W_{11} & W_{12} & \cdots & W_{1n}\\ W_{21} & W_{22} & \cdots & W_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ W_{n1} & W_{n2} & \cdots & W_{nn}\\ \end{bmatrix} \]
We want each arc to have the weight
\[ W_{ij} = X_{ij}/d_{i,out} \]
We can do this easily BUT we have to be careful so that we do not divide by 0 (we define \(0/0\) as \(0\))
degrees <- degree(X,outdegree)
W <- X
W[degrees>0,] <- X[degrees>0,]/degrees[degrees>0]
# You can check that we are actually normalising correctly
# Wcopy <- X
# for (i in c(1:n))
# {
# if (degrees[i]>0)
# {
# Wcopy[i,] <- X[i,]/degrees[i]
# }
#
#}
#sum( W != Wcopy)
Check the residuals again
plot(nacf(W,ans.ols$residuals,type="moran")[1:4],type='b')
Do we see any difference? In the formula for \(I_k\) the scaling factors \(d_i\) will actually cancel out.
The network autocorrelation model, or network disturbances model (Marsden and Friedkin 1993), looks exactly like the OLS \[ y_i = M_i \beta + \epsilon_i\text{,} \]
but we no longer assume that the residuals are independent. Instead, we induce network autoocorrelation on the error terms
\[ \epsilon_i = \rho \sum_{j=1}^n W_{ij}\epsilon_j+\xi_i \]
(\(\xi\) is the Greek letter Xi - https://en.wikipedia.org/wiki/Xi_(letter)). The error tems \(\xi_i\) are assumed to be independent and identically distributed \(\xi_i \thicksim N(0,\sigma^2)\). The interpretation is that if \(i\) has a higher than predicted value on the outcome variable then \(j\) is more likely to also have higher than predicted values for all \(j\) that \(i\) has nominated.
If you know a bit of matrix algebra, you will notice that we can write the vector of disturbances \(\epsilon =(\epsilon_1,\ldots,\epsilon_n)^{\top}\) as
\[ \epsilon = \rho \mathbf{W} \epsilon + \xi \]
One immediate issue here is that we have \(\epsilon\) on both sides of the equation. We can simplify this expression by solving for \(\epsilon\)
\[ \epsilon = (I-\rho \mathbf{W})^{-1}\xi \]
You can interpret this as the error terms \(\xi_i\) ‘spreading’ on the network.
The function lnam (which stands for linear network
autocorrelatio models, pressumably) can fit this model. The formula is
specified in terms of the outcome y variable and the matrix
x of covariates. The weight matrix is specified as
W2.
netacm.1 <-lnam(y=y, x=M, W2=W)
## Loading required namespace: numDeriv
summary(netacm.1)
##
## Call:
## lnam(y = y, x = M, W2 = W)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7914 -0.7067 0.2933 1.0298 2.4933
##
## Coefficients:
## Estimate Std. Error Z value Pr(>|z|)
## cons 2.5067 0.3157 7.939 2e-15 ***
## smoke 1.0847 0.3299 3.288 0.00101 **
## sport 0.2000 0.3026 0.661 0.50869
## rho2.1 0.6492 0.2995 2.168 0.03017 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Estimate Std. Error
## Sigma 0.9347 0.009
##
## Goodness-of-Fit:
## Residual standard error: 1.047 on 46 degrees of freedom (w/o Sigma)
## Multiple R-Squared: 0.2039, Adjusted R-Squared: 0.1346
## Model log likelihood: -68.48 on 45 degrees of freedom (w/Sigma)
## AIC: 147 BIC: 156.5
##
## Null model: meanstd
## Null log likelihood: -79.54 on 48 degrees of freedom
## AIC: 163.1 BIC: 166.9
## AIC difference (model versus null): 16.11
## Heuristic Log Bayes Factor (model versus null): 10.38
Did the conclusions about the regressors smoke and sport change?
The difference between smokers and non-smokers remain but the effect is somewhat smaller.
Is there evidence of network autocorrelation?
The network autocorrelation (must be) \(\rho\) is here rho2.1 and is estimated to 0.16 and is statistically significantly different from zero on the 5%-level using a one-sided test.
Looking at the network plot, it almost looks as if high-degree nodes drink more. To take this into account, we add outdegree as an explanatory variable:
M <- cbind(matrix(1,n,1),degrees,smoke,sport )
colnames(M) <- c('cons','degree','smoke','sport')
head(M)
## cons degree smoke sport
## [1,] 1 3 0 1
## [2,] 1 4 1 0
## [3,] 1 4 0 0
## [4,] 1 4 0 1
## [5,] 1 2 0 1
## [6,] 1 2 0 1
Now re-run the network autocorrlation model (call the output
netacm.2 to align with the code in the code chunk
below)
netacm.2 <-lnam(y=y, x=M, W2=W)
summary(netacm.2)
##
## Call:
## lnam(y = y, x = M, W2 = W)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.91118 -0.71108 0.09031 0.60982 2.34177
##
## Coefficients:
## Estimate Std. Error Z value Pr(>|z|)
## cons 2.12466 0.39409 5.391 7e-08 ***
## degree 0.08893 0.05627 1.580 0.113994
## smoke 1.07210 0.31516 3.402 0.000669 ***
## sport 0.25294 0.29605 0.854 0.392900
## rho2.1 0.66227 0.29190 2.269 0.023278 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Estimate Std. Error
## Sigma 0.9115 0.009
##
## Goodness-of-Fit:
## Residual standard error: 1.032 on 45 degrees of freedom (w/o Sigma)
## Multiple R-Squared: 0.2479, Adjusted R-Squared: 0.1643
## Model log likelihood: -67.27 on 44 degrees of freedom (w/Sigma)
## AIC: 146.5 BIC: 158
##
## Null model: meanstd
## Null log likelihood: -79.54 on 48 degrees of freedom
## AIC: 163.1 BIC: 166.9
## AIC difference (model versus null): 16.55
## Heuristic Log Bayes Factor (model versus null): 8.902
Does accounting for outdegree change the conclusions about network autocorrelation?
No, the coefficient for degree itselft is not statistically significantly different from 0. The magnitude of the network autocorrelation paramter is increased substantially.
The network effects model assumes that there is correlation through the outcome variable \(\mathbf{Y}\). This subtly different from correlation only through the error term. The outcome \(Y_i\) for \(i\) is a a weighted sum of the values \(Y_j\) of the contacts of \(i\)
\[ Y_i = \rho \sum_{j=1}^n W_{ij}Y_j + M_i\beta+\epsilon_i \]
where \(\xi_i \thicksim N(0,\sigma^2)\) independently for all \(i\).
If you want to model social influence or social contagion, the network effects model is more approprate than the autocorrelation model as the former models the dependence between values on the outcome varaible
The equation for the full outcome vector can, as before, be written compactly as
\[ \mathbf{Y} = \rho \mathbf{W} \mathbf{Y} + \mathbf{M}\beta+\epsilon \]
Here it is there is feedback on the outcome variable.
We can derive this model out of a longitudinal model
\[ Y_{i,t} = \rho \sum_{j=1}^n W_{ij}Y_{j,t-1}+\mathbf{M}\beta+\epsilon_{i,t} \]
as \(t\) tends to infinity.
For a quick illustration, let us look at the first 5 interaction of the updating accourding to the longitudinal form of the model
rho <- 0.4# set a strong influence paramter
beta <- matrix(summary(netacm.2)$beta[,1],4,1)# use the estimated coefficients from the last regression
Y <- matrix(rnorm(n),n,1)# random starting point
par(mfrow=c(2,3), oma = c(0,1,0,0) + 0.1,mar = c(1,0,1,1) + 0.1)
coord <- gplot(X,gmode='graph',vertex.cex = Y/2,main= paste('M ',round(nacf(W,Y[,1],type="moran")[2],3) ) )
for (k in c(1:5)){
Y <- rho * W %*% Y + M %*% beta + matrix(rnorm(n),n,1)# update according to equation; %*% for matrix multiplication
gplot(X,gmode='graph',vertex.cex = Y/2, coord=coord, main= paste('M ',round(nacf(W,Y[,1],type="moran")[2],3) ) )
}
We may or may not see increased correlation in the outcomes. Running 100 iterations this will become a little more evident
moran <- matrix(0,100,3)
rho <- .4
Y <- matrix(rnorm(n),n,1)
for (k in c(1:100)){
Y <- rho * W %*% Y + M %*% beta + matrix(rnorm(n),n,1)
moran[k,] <- nacf(W,Y[,1],type="moran")[2:4]}
par(mfrow=c(1,1))
plot(moran[,1],type='l')
lines(moran[,2],col='red')
lines(moran[,3],col='blue')
abline( h = -1/(n-1), col='grey')
We might see a slight trend upwards and correlations are generally speaking positive.
As before \(Y\) on both sides of the equation, but we can solve for \(Y\)
\[ \mathbf{Y} = (I- \rho \mathbf{W})^{-1}(\mathbf{M}\beta+\epsilon) \]
The same function as before, lnam is used to fit this
model. The difference from the network autocorrelation model is that the
weight matrix is specified as W1 not
W2.
neteff.2 <-lnam(y=y, x=M, W1=W)
summary(neteff.2)
##
## Call:
## lnam(y = y, x = M, W1 = W)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.04530 -0.70707 0.08996 0.74132 2.41599
##
## Coefficients:
## Estimate Std. Error Z value Pr(>|z|)
## cons 1.78401 0.42224 4.225 2.39e-05 ***
## degree 0.07223 0.05714 1.264 0.206154
## smoke 1.14399 0.31259 3.660 0.000253 ***
## sport 0.19588 0.31403 0.624 0.532785
## rho1.1 0.29178 0.17255 1.691 0.090838 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Estimate Std. Error
## Sigma 0.9404 0.009
##
## Goodness-of-Fit:
## Residual standard error: 1.025 on 45 degrees of freedom (w/o Sigma)
## Multiple R-Squared: 0.3229, Adjusted R-Squared: 0.2477
## Model log likelihood: -68.05 on 44 degrees of freedom (w/Sigma)
## AIC: 148.1 BIC: 159.6
##
## Null model: meanstd
## Null log likelihood: -79.54 on 48 degrees of freedom
## AIC: 163.1 BIC: 166.9
## AIC difference (model versus null): 14.98
## Heuristic Log Bayes Factor (model versus null): 7.336
There is no evidence for social influence on alcohol
Fit a regression that only includes (the intercept) smoke and that uses the original, non-normalised, adjacency matrix
neteff.2 <-lnam(y=y, x=M[,c(1,3)], W1=X)
summary(neteff.2)
##
## Call:
## lnam(y = y, x = M[, c(1, 3)], W1 = X)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.91250 -0.60102 -0.06224 0.75113 2.24134
##
## Coefficients:
## Estimate Std. Error Z value Pr(>|z|)
## cons 2.21840 0.25578 8.673 < 2e-16 ***
## smoke 1.10413 0.30849 3.579 0.000345 ***
## rho1.1 0.06896 0.03219 2.142 0.032153 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Estimate Std. Error
## Sigma 0.9503 0.009
##
## Goodness-of-Fit:
## Residual standard error: 1.013 on 47 degrees of freedom (w/o Sigma)
## Multiple R-Squared: 0.3161, Adjusted R-Squared: 0.2724
## Model log likelihood: -68.59 on 46 degrees of freedom (w/Sigma)
## AIC: 145.2 BIC: 152.8
##
## Null model: meanstd
## Null log likelihood: -79.54 on 48 degrees of freedom
## AIC: 163.1 BIC: 166.9
## AIC difference (model versus null): 17.9
## Heuristic Log Bayes Factor (model versus null): 14.07
What do you conclude from the results? Can you relate this to total and average exposure in the diffusion model?
When the the matrix is not row-normlised there is evidence of social influence. This means that it is the number of people that exert influence on you and not the proportion of friends.
This tutorial introduces aspects of the Bayesian estimation for auto-logistic actor attribute models (ALAAMs)(Robins, Pattison, and Elliott (2001), Daraganova (2009), and Daraganova and Robins (2013)) developed in Koskinen and Daraganova (2022).
The main functions are defined in `balaam’ which can be “loaded” from GitHub.
source("https://raw.githubusercontent.com/johankoskinen/ALAAM/main/balaam.R")
A (proto-) manual is avaialble on GitHub alaam_effects.
There are a number of dependencies in the functions but RStudio should prompt you to install of the following if you have not.
require(MASS)
## Loading required package: MASS
require('mvtnorm')
## Loading required package: mvtnorm
require('coda')
## Loading required package: coda
In particular, we will use sna (Butts 2016) and network (Butts 2015)
require(sna)
require(network)
We are looking at the s50 dataset, which is further described here: https://www.stats.ox.ac.uk/~snijders/siena/s50_data.htm
This dataset is available in ziped format online.
temp <- tempfile()
download.file("https://www.stats.ox.ac.uk/~snijders/siena/s50_data.zip",temp)
adj <- as.matrix( read.table(unz(temp, "s50-network1.dat")) )
sport <- read.table(unz(temp, "s50-sport.dat"))
smoke <- read.table(unz(temp, "s50-smoke.dat"))
alcohol <- read.table(unz(temp, "s50-alcohol.dat"))
unlink(temp)
n <- nrow(adj)
adj <- as.matrix(adj) # convert from data.frame to matrix
smoke <- smoke[,2] # use wave 2
smoke[smoke<2] <- 0 # set non-smoker to 0
smoke[smoke>0] <- 1 # set occasional and regular to 1
sport <- sport-1
The main estimation function estimate.alaam requires
data to be in data.frame format (check
alaam_effects.pdf)
my.data <- data.frame(smoke=smoke, alcohol=alcohol[,1],sport=sport[,1])
Check data
head(my.data)
## smoke alcohol sport
## 1 0 3 1
## 2 1 2 0
## 3 0 2 0
## 4 0 2 1
## 5 0 3 1
## 6 0 4 1
We might want to check what the association are like, for example with alcohol
table(my.data$smoke,my.data$alcohol)
##
## 1 2 3 4 5
## 0 4 13 10 5 1
## 1 1 3 2 9 2
and with outdegree
boxplot(degree( adj , cmode = "outdegree") ~ my.data$smoke)
The social influence model developed by Robins, Pattison, and Elliott (2001) and later elaborated by Daraganova (2009) and Daraganova and Robins (2013) and now refered to as the autologistic actor-attribute model (ALAAM), is a model for binary nodal attributes \(\boldsymbol{y}= \{Y_i:1 \leq i \leq n \}\), conditional on a network adjacency matrix \(\mathbf{X} = \{ X_{ij}: (i,j)\in V \times V \}\). \[ p_{\boldsymbol{\theta}}(\boldsymbol{y} | \mathbf{X} ) = \exp \left\{ \boldsymbol{\theta}^{\top} \boldsymbol{z}(\boldsymbol{y},\mathbf{X}) - \psi(\boldsymbol{\theta}; \mathbf{X}) \right\} \] Here \(\boldsymbol{z}(\boldsymbol{y},\mathbf{X})\) is a \(p\times 1\) vector of statistics calculated for the the dependent variable \(y\) and the network \(x\).
This part of the tutorial takes you through the Bayesian inference scheme of Koskinen and Daraganova (2022).
The aim of the MCMC of Koskinen and Daraganova (2022), is to draw samples from and thereby approximate the posterior distribution \[ \pi(\boldsymbol{\theta} | \boldsymbol{y},\mathbf{X}) \propto p_{\boldsymbol{\theta}}(\boldsymbol{y} | \mathbf{X} ) \pi(\boldsymbol{\theta}) = \exp \left\{ \boldsymbol{\theta}^{\top} \boldsymbol{z}(\boldsymbol{y},\mathbf{X}) - \psi(\boldsymbol{\theta}; \mathbf{X}) \right\} \pi(\boldsymbol{\theta} ){\text{,}} \] where \(\pi(\boldsymbol{\theta} )\) is the prior distribution of the parameters. We need to use MCMC because the normalising constant of \(\pi(\boldsymbol{\theta} | \boldsymbol{y},\mathbf{X})\) is not analytically tractable (nor is the normalising constant of the model, $(; ) $).
Like for the function lm, and glm, the
intuition of the formula is that the LHS is some transformation of the
RHS. For ALAAM, we can think of the model specification in terms of the
conditional logits
\[ {\rm{logit}}\left[ \Pr(Y_i = 1 \mid \boldsymbol{y}_{-i},\boldsymbol{\theta}) \right]=\boldsymbol{\theta}^{\top}\boldsymbol{\omega}_i(\boldsymbol{y},\mathbf{X})=\theta_1\omega_{1i}(\boldsymbol{y},\mathbf{X})+\cdots +\theta_p\omega_{pi}(\boldsymbol{y},\mathbf{X}) \] for the change statistics
\[ \omega_{ji}(\boldsymbol{y},\mathbf{X})=z_j(\Delta_i^+\boldsymbol{y},\mathbf{X})-z_j(\Delta_i^-\boldsymbol{y},\mathbf{X}) \] are the differences in statistics evaluated on \(\boldsymbol{y}\) when the variable \(Y_i\) is forced to be \(y_i=1\), for \(\Delta_i^+\boldsymbol{y}\), and forced to be \(y_i=0\), for \(\Delta_i^-\boldsymbol{y}\).
In other words we can think of
y ~ z1+z2+...+zp
## y ~ z1 + z2 + ... + zp
as stating that the LHS (y) should be the conditional
logit, and the RHS be the “linear predictor”.
As further described in alaam.terms
(alaam_effects.pdf), there are four classes of
effects:
The effect of a covariate x on a dichotomous outcome
variable y is specified as a standard formula
y ~ x
## y ~ x
Any additional effect of an additional mondatic covariate
z is added +z, e.g.
y ~ x + z
## y ~ x + z
To include an interaction of two monadic covariates, the interaction
symbol * is used
y ~ x + z + x*z
## y ~ x + z + x * z
The following standard formulae functions are not supported:
y ~ x + I(x^2)# create the squared term manually and add it to data frame instead
## y ~ x + I(x^2)
y ~ x + I(x>0)# create the indicator term manually and add it to data frame instead
## y ~ x + I(x > 0)
y ~ -1 +x # the intercept term cannot be removed
## y ~ -1 + x
It is natural to include network metrics in your model. You are free
to precalculate any network measures that you find useful and add them
as monadic covariates. For example, you may want to add the effect of
betweeness centrality or closeness centrality,
clc <- closeness(adj) to your model, in which case you
add clc to the data frame.
Some of network statistics that can be dervived from the
network activity dependency assumption (Section 3.1.1
Network activity dependence, and also Indirect Structural
Influence, Koskinen and Daraganova
(2022)) are available as built in effects. For example, for a
directed network, you can specify the effect of sending ties on the
outcome y as
y ~ odegree
## y ~ odegree
The effects that are available from formula are:
| Effect name | Formula | Description |
|---|---|---|
degree |
\(y_ix_{i\cdot}=y_i\sum_j x_{ij}\) | For undirected networks, this measures the association of degree centrality and the probability of success |
idegree |
\(y_ix_{\cdot i}=y_i\sum_j x_{ji}\) | For directed networks, this measures the association of in-degree centrality and the probability of success |
odegree |
\(y_ix_{i\cdot}=y_i\sum_j x_{ij}\) | For directed networks, this measures the association of out-degree centrality and the probability of success |
recipties |
\(y_i\sum_j x_{ij}x_{ji}\) | For directed networks, this measures the association of out-degree centrality and the probability of success |
twostar |
\(y_i {\binom{x_{i\cdot}}{2}}\) | For undirected networks, the effect of centrality over and above degree |
intwostar |
\(y_i\binom{x_{\cdot i}}{2}\) | For directed networks, the effect of indegree centrality over and above indegree |
outtwostar |
\(y_i\binom{x_{i\cdot}}{2}\) | For directed networks, the effect of outdegree centrality over and above outdegree |
threestars |
\(y_i\binom{x_{i\cdot}}{3}\) | For undirected networks, the effect of degree centrality over and above twostars |
twopath |
\(y_i(x_{\cdot i}x_{i\cdot}-\sum_j x_{ij}x_{ji})\) | For directed networks, the association of brokerage on the probability of success. |
inthreestar |
\(y_i\binom{x_{\cdot i}}{3}\) | For directed networks, the effect of indegree centrality over and above intwostars |
outthreestar |
\(y_i\binom{x_{i\cdot}}{3}\) | For directed networks, the effect of outdegree centrality over and above outtwostars |
transties |
\(y_i\sum_{j,k \neq i}x_{ij}x_{ik}x_{jk}\) | For (directed) undirected networks, the effect on probability of success of being embedded in (transitive) triads |
indirties |
\(y_i\sum_j x_{ij} \sum_{k}(1-x_{ik})x_{jk}\) | For (directed) undirected networks, the effect on probability of success of having ties to people that have ties to many people you are not directly tied to (see 3.1.3 Indirect network and contagion dependencies, Koskinen and Daraganova, for details) |
Contagion effects are parameters of statistics that capture dependence among outcomes. These can be interpreted in terms of conditional distributions, for example for “simple” or “direct contagtion” \[ {\rm{logit}}\left[\Pr(Y_i = 1 \mid \boldsymbol{y}_{-i},\boldsymbol{\theta})\right]=\theta_{DC}\sum_{j\neq i}y_jx_{ij}+c. \]
At the moment, the contagion effects that are implemented are:
| Effect name | Formula | Description |
|---|---|---|
simple |
\(\sum_{i,j}y_iy_jx_{ij}\) | is the probability of success increased by being connected to actors whose outcome is a success |
recip |
\(\sum_{i,j}y_iy_jx_{ij}x_{ji}\) | is the probability of success increased by being mutually tied to actors whose outcome is a success (directed networks only) |
indirect |
\(\sum_{i}y_i\sum_{j}x_{ij}\sum_{k\neq i,j}y_kx_{jk}\) | is the probability of success increased by being indirectly connected to actors whose outcome is a success (see 3.1.3 Indirect network and contagion dependencies, Koskinen and Daraganova, for details)) |
closedind |
\(\sum_{i}y_i\sum_{j}x_{ij}\sum_{k\neq i,j}y_kx_{ik}x_{jk}\) | is the probability of success increased by being both indirectly and directly connected to actors whose outcome is a success (see 3.1.3 Indirect network and contagion dependencies as well as supplementary material, Koskinen and Daraganova, for details)) |
transitive |
\(\sum_{i}y_i\sum_{j}x_{ij}y_{j}\sum_{k\neq i,j}y_kx_{ik}x_{jk}\) | is the probability of success increased by being embedded in triads where the two other members have success on the outcome(see 3.1.3 Indirect network and contagion dependencies as well as supplementary material, Koskinen and Daraganova, for details)) |
We will now illustrate two minimal examples of estimating models.
For a Markov model (Robins, Pattison, and
Elliott 2001), the sufficient statistics are, degrees \(x_{i\cdot}=\sum_j x_{ij}\), two-stars \(\binom{x_{i\cdot}}{2}\), three-stars \(\binom{x_{i\cdot}}{3}\), and triangles
\(\sum_{j,k \neq
i}x_{ij}x_{ik}x_{jk}\). These could be be pre-calculated and used
as monadic covariates but we will draw on the functionality of
balaam.
Assume a model for smoke, where we include the
effects
res.0 <- estimate.alaam(smoke ~odegree+alcohol+sport, my.data, adjacency=adj,
Iterations=1000)
## Network is directed with 50 nodes
## [1] "you have p: 5"
##
## observed stats: 17 25 43 59 11
## number of covariates: 3
## A thinning of 1 ,
## (parameter) burn-in of 1 , and
## iterations of 1000 ,
## yields a sample size of 1000 .
## Estimation using pseudo likelihood:
##
## (Printing every 100 iterations)
##
## you have done 100 iterations out of 1000
## theta: -1.977 0 0.096 0.627 -1.038
## you have done 200 iterations out of 1000
## theta: -4.29 0 0.256 1.32 -1.478
## you have done 300 iterations out of 1000
## theta: -6.008 0 0.866 1.13 -0.882
## you have done 400 iterations out of 1000
## theta: -2.749 0 0.141 0.654 -0.496
## you have done 500 iterations out of 1000
## theta: -6.201 0 0.63 1.248 -0.485
## you have done 600 iterations out of 1000
## theta: -4.584 0 0.065 1.052 0.583
## you have done 700 iterations out of 1000
## theta: -2.732 0 0.177 0.858 -0.675
## you have done 800 iterations out of 1000
## theta: -3.791 0 0.247 0.995 -1.715
## you have done 900 iterations out of 1000
## theta: -1.347 0 -0.005 0.758 -1.486
## you have done 1000 iterations out of 1000
## theta: -1.954 0 0.159 0.327 -0.627
## summaries of the posterior draws:
## mean sd ESS SACF 10 SACF 30
## intercept -3.3954 1.3685 40.9843 0.4775 0.0819
## contagion 0.0000 0.0000 0.0000
## odegree 0.2458 0.3022 41.1220 0.4555 0.0307
## alcohol 0.9443 0.3677 39.3412 0.4535 0.1931
## sport -1.0165 0.8287 40.0459 0.4192 0.3181
##
## if ESS is small you need to increase the number of Iterations
## if and/or increase the parameter thinning which is currently 1 .
## Alternatively, adjust the proposals by improving the proposal variance-covariance:
## e.g. set PropSigma equal to the covariance of the posterior draws, or
## increase/decrease the scaling of the proposals (default: scaling=1/sqrt( 5 ); current: 1 /sqrt( 5 )
## Acceptance ratio: 0.484 (idealy around 0.25)
Taking the ANOVA table at face value, only alcohol has a clear non-zero parameter and effect on smoking, judging by the size of the standard deviation relative to the mean paramter.
Assume that in addition to the effects in the independent model, we
also want to account for social influence by including a direct
contagion effect. The effect name for this is simple.
res.DC.0 <- estimate.alaam(smoke ~idegree+odegree+alcohol+sport+simple, my.data, adjacency=adj,
Iterations=1000)
## Network is directed with 50 nodes
## [1] "you have p: 6"
##
## observed stats: 17 25 40 43 59 11
## number of covariates: 4
## A thinning of 1 ,
## (parameter) burn-in of 1 , and
## iterations of 1000 ,
## yields a sample size of 1000 .
##
## you have done 100 iterations out of 1000
## theta: -6.209 0.369 -0.391 0.224 1.989 -1.028
## you have done 200 iterations out of 1000
## theta: -5.002 0.927 -1.087 0.873 1.083 -0.449
## you have done 300 iterations out of 1000
## theta: -4.655 0.213 -0.316 0.698 0.865 -0.246
## you have done 400 iterations out of 1000
## theta: -2.288 0.681 -0.926 0.091 1.149 -1.794
## you have done 500 iterations out of 1000
## theta: -4.205 0.303 -1.07 1.009 1.566 -2.08
## you have done 600 iterations out of 1000
## theta: -0.038 0.85 -0.696 0.033 0.259 -1.787
## you have done 700 iterations out of 1000
## theta: -1.559 1.056 -0.486 -0.831 0.829 -1.008
## you have done 800 iterations out of 1000
## theta: -1.173 0.4 -0.622 -0.029 0.853 -1.757
## you have done 900 iterations out of 1000
## theta: -4.532 0.845 -0.87 0.228 1.017 0.263
## you have done 1000 iterations out of 1000
## theta: -2.071 0.852 -1.333 0.304 1.154 -2.335
## summaries of the posterior draws:
## mean sd ESS SACF 10 SACF 30
## intercept -3.3779 1.4755 17.7910 0.7121 0.4368
## contagion 0.6553 0.2909 26.6620 0.5746 0.0191
## idegree -0.7546 0.3637 14.2086 0.7325 0.3522
## odegree 0.3020 0.4178 17.2616 0.7132 0.4410
## alcohol 1.0540 0.3872 24.0355 0.6307 0.2350
## sport -1.1456 0.6902 31.4042 0.5630 0.3194
##
## if ESS is small you need to increase the number of Iterations
## if and/or increase the parameter thinning which is currently 1 .
## Alternatively, adjust the proposals by improving the proposal variance-covariance:
## e.g. set PropSigma equal to the covariance of the posterior draws, or
## increase/decrease the scaling of the proposals (default: scaling=1/sqrt( 6 ); current: 1 /sqrt( 6 )
## Acceptance ratio: 0.286 (idealy around 0.25)
Taking the ANOVA table at face value, now it seems only the contagion parameter is clearly non-zero, judging by the size of the standard deviation relative to the mean paramter.
The output gives us an ANOVA-like table with posterior means and standard deviations. We can also get this table from the estimation object
res.0$ResTab
## mean sd ESS SACF 10 SACF 30
## intercept -3.39535594 1.36845791 40.98428697 0.47753938 0.08194740
## contagion 0.00000000 0.00000000 0.00000000
## odegree 0.24579631 0.30220180 41.12195659 0.45552884 0.03069873
## alcohol 0.94434944 0.36765084 39.34118992 0.45351163 0.19309350
## sport -1.01649249 0.82873572 40.04592034 0.41923501 0.31813456
and a more detailed results table using
write.res.table(res.0,burnin=1,thin=1)
## parameter mean sd .025 0.975
## 1 intercept -3.395 1.368 -6.381 -0.870
## 2 contagion 0.000 0.000 0.000 0.000
## 3 odegree 0.246 0.302 -0.305 0.907
## 4 alcohol 0.944 0.368 0.319 1.809
## 5 sport -1.016 0.829 -2.599 0.500
If we were to create a 95% Credibility interval for the parameter of
alcohol, this would not include 0. The parameter is
positive with high posterior probability.
The summaries in the table are simply summary statistics for the full \(p\)-dimensional posterior. For example,
mean(res.0$Thetas[,1])
## [1] -3.395356
sd(res.0$Thetas[,1])
## [1] 1.368458
The full distribution can be plotted manually
hist(res.0$Thetas[,1])
And bivariate plots can also be made manually
plot(res.0$Thetas[,c('intercept')],res.0$Thetas[,c('odegree')])
In balaam the function plotPost, produces
histograms, trace plots, and autocorrelation plots for all parameters in
your model.
plotPost(res.0,figname='markov 0',showplot=TRUE)
## Loading required package: xtable
## Parameter 2 all zeros
We see that the posterior distribution for
alcoholis concentrated to the right of 0.
The MCMC algortihm generates a sequence
\[ \boldsymbol{\theta}_0,\boldsymbol{\theta}_1,\ldots,\boldsymbol{\theta}_T \]
of \(T\) paramter draws from the posterior \(\pi(\boldsymbol{\theta} \mid \mathbf{y})\). The draws are made by proposing a new value \(\boldsymbol{\theta}^{\ast}\) given a current value \(\boldsymbol{\theta}_t\) in iteration \(t\) \[ \boldsymbol{\theta}^{\ast} \mid \boldsymbol{\theta}_t \thicksim \mathcal{N}_p( \boldsymbol{\theta}_t , \boldsymbol{\Sigma}). \]
This new value is either accepted, and \(\boldsymbol{\theta}_{t+1}\) is set to \(\boldsymbol{\theta}^{\ast}\), or rejected, in which case \(\boldsymbol{\theta}_{t+1}\) is set to \(\boldsymbol{\theta}_{t}\).
We can plot the sequence of updates \(\boldsymbol{\theta}_0,\ldots,\boldsymbol{\theta}_T\), to get a sense of whether updates are large or small, and if many or few proposed values \(\boldsymbol{\theta}^{\ast}\) are accepted
plot( ts( res.0$Thetas) )
What we are looking for in the trace plots are
The default is to set \(\boldsymbol{\theta}_0\) to the maximum likelihood estimate (MLE) for an independent model. These starting values should be sufficiently good for most models. We can however look at the contagion parameter for the DC model, for which the parameter is intialised in \(\theta_{DC}=0\)
plot( ts( res.DC.0$Thetas[,2]), ylab=colnames(res.DC.0$Thetas)[2], xlab='Iteration' )
lines( cumsum(res.DC.0$Thetas[,2])/c(1:1000), col='red')
> There is somehing of an increasing trend
Since the MCMC is iterative, the chains could
This would mean that values \(\boldsymbol{\theta}_{t}\) and \(\boldsymbol{\theta}_{s}\), for iterations \(s\) and \(t\) that are close to each other, are likely to be more similar, more correlated, than for iterations \(s\) and \(t\) that are far appart. This is the first sources of serial autocorrelation in the chains. The second source, relates to how big jumps we propose, i.e. how close is \(\boldsymbol{\theta}^{\ast}\) to the current value \(\boldsymbol{\theta}_t\) in iteration \(t\)? If we make too small jumps, values or iterations \(s\) and \(t\) that are close to each other will be highly correlated.
A perfect sampler would propose and accept \(\boldsymbol{\theta}^{\ast}\) regadeless of where we currently are in iteration \(t\). If this were the case, then the effective sammple size would be equal to the total number of iterations. As a ficticious example, consider drawing 100 normal variates
# WHITE NOISE
theta.hypothetical <- rnorm(100, mean =1, sd=1.5)
par( mfrow= c(1,2) )
plot(theta.hypothetical,type='l')
hist(theta.hypothetical)
abline(v=1)
The draws here randomly fluctuate areound the mean (1.5), and if we project the draws to a histogram, this gives us the sample from our target distribution.
For ALAAM posteriors we have three ways of checking the autocorrelation.
The results table provide the sample autocorrelation between draws \(\boldsymbol{\theta}_{t}\) and \(\boldsymbol{\theta}_{t+k}\), for lags \(k=10\) and \(k=30\).
res.DC.0$ResTab
## mean sd ESS SACF 10 SACF 30
## intercept -3.37793664 1.47549094 17.79100523 0.71210864 0.43679635
## contagion 0.65526451 0.29086453 26.66197009 0.57455578 0.01912824
## idegree -0.75457907 0.36366779 14.20860936 0.73246188 0.35222317
## odegree 0.30203517 0.41782726 17.26160145 0.71316465 0.44102787
## alcohol 1.05395604 0.38718390 24.03548280 0.63071328 0.23496987
## sport -1.14555868 0.69017863 31.40419919 0.56299232 0.31941782
The plotting function gives us the correlations for all lags.
We can use the standard function acf, for example for
the contagion parameter
acf( res.DC.0$Thetas[,2] )
We can compare this with the white noise, where we have 100
independent draws.
acf(theta.hypothetical)
as.numeric(acf(theta.hypothetical,plot=FALSE)[c(5,10)][[1]])
## [1] 0.02049821 -0.11053814
With high autocorrelation we need more iterations to get a “representative” sample from the posterior
Intuitively, with high autocorrelation
Your posterior will look very different depending on what subset of iterations you look at!
NOTE The log-run behaviout of the Monte carlo mean \[ \bar{\boldsymbol{\theta}}=\frac{1}{T}\sum_{t=0}^T\boldsymbol{\theta}_t \]
will not be affected by high autocorrelation. The long-run behaviour (and quality) of the Monte Carlo estimate of the variance-covariance matrix \[ \bar{V}(\boldsymbol{\theta})=\frac{1}{T}\sum_{t=0}^T\boldsymbol{\theta}_t\boldsymbol{\theta}_t^{\top}-\bar{\boldsymbol{\theta}}\bar{\boldsymbol{\theta}}^{ \top} \]
and posterior credibility intervals will be (potentially) severely affected.
The effective sample size (ESS), is an estimate of how many independent draws that you have in your draws. For the white noise, we have 100 independent draws. The effective (independent) sample size is equal to the number of draws.
effectiveSize(theta.hypothetical)
## var1
## 100
For the DC model, we did 1000 iteration and the reported ESS was around 30 for most parameters. This means that we need to do roughly
1000/30
## [1] 33.33333
updates in the MCMC for every approximately independent draw.
thinningWe can continue the estimation where the previous estimation finished
using prevBayes. We can also set par.burnin
and thinning to reduce the SACF. With 10100 iterations,
burning of 100, and thinning or 10, we will get a total sample size of
(5100-100)/10=500.
# Previous call for reference
#res.DC.0 <- estimate.alaam(smoke ~idegree+odegree+alcohol+sport+simple, my.data, adjacency=adj,
# Iterations=1000)
res.DC.1 <- estimate.alaam(smoke ~idegree+odegree+alcohol+sport+simple, my.data, adjacency=adj,
Iterations=5100,
prevBayes=res.DC.0,# our first estimation
par.burnin=100,# discard the first 100 iterations
thinning=10)# only use every
## [1] "you have p: 6"
##
## observed stats: 17 25 40 43 59 11
## number of covariates: 4
## A thinning of 10 ,
## (parameter) burn-in of 100 , and
## iterations of 5100 ,
## yields a sample size of 501 .
##
## you have done 510 iterations out of 5100
## theta: -7.233 0.333 -0.751 0.91 1.976 -1.758
## you have done 1020 iterations out of 5100
## theta: -4.295 0.555 -1.393 1.601 1 -1.318
## you have done 1530 iterations out of 5100
## theta: -2.635 0.234 -0.994 0.484 0.991 -0.453
## you have done 2040 iterations out of 5100
## theta: -2.003 0.795 -0.668 -0.127 0.942 -1.607
## you have done 2550 iterations out of 5100
## theta: -4.806 0.524 -0.959 0.669 1.296 -0.676
## you have done 3060 iterations out of 5100
## theta: -1.365 0.145 -0.621 0.082 0.954 -1.213
## you have done 3570 iterations out of 5100
## theta: -6.175 0.668 -0.187 0.197 1.206 -0.02
## you have done 4080 iterations out of 5100
## theta: -5.843 1.467 -1.313 1.268 1.076 -1.538
## you have done 4590 iterations out of 5100
## theta: -5.483 1.076 -1.657 0.902 1.639 -1.88
## you have done 5100 iterations out of 5100
## theta: -2.9 0.862 -0.434 -0.032 0.693 -1.024
## summaries of the posterior draws:
## mean sd ESS SACF 10 SACF 30
## intercept -3.7220 1.7260 63.5302 0.1125
## contagion 0.6272 0.3020 86.9372 0.0482
## idegree -0.7739 0.3486 53.7528 0.1768
## odegree 0.5038 0.4326 55.3393 0.1267
## alcohol 1.0246 0.4453 83.6292 0.0903
## sport -1.0839 0.8117 108.1557 0.0416
##
## if ESS is small you need to increase the number of Iterations
## if and/or increase the parameter thinning which is currently 10 .
## Alternatively, adjust the proposals by improving the proposal variance-covariance:
## e.g. set PropSigma equal to the covariance of the posterior draws, or
## increase/decrease the scaling of the proposals (default: scaling=1/sqrt( 6 ); current: 1 /sqrt( 6 )
## Acceptance ratio: 0.4896078 (idealy around 0.25)
Draws 10 iterations apart are now very close to independent, judging by the SACF. ESS are close to 100.
The figures look promising but the proportion of accepted proposals (acceptance ratio) is high. This suggests that chains might be making too small moves. Check trace plots
plot( ts( res.DC.1$Thetas ))
If we set recalibrate equal to TRUE, we will estimate
\(\hat{\boldsymbol{\Sigma}}=V(\boldsymbol{\theta}
\mid \boldsymbol{y})\), and propose moves from \[
\boldsymbol{\theta}^{\ast} \mid \boldsymbol{\theta}_t \thicksim
\mathcal{N}_p( \boldsymbol{\theta}_t ,
\tfrac{c}{\sqrt{p}}\hat{\boldsymbol{\Sigma}})
\] The tuning constant \(c\) is
given by the argument scaling.
res.DC.2 <- estimate.alaam(smoke ~idegree+odegree+alcohol+sport+simple, my.data, adjacency=adj,
Iterations=5100,
prevBayes=res.DC.0,# our first estimation
par.burnin=100,# discard the first 100 iterations
thinning=10,# only use every 10
recalibrate=TRUE,# use proposal variance from previous posterior
scaling = 0.55)# scale down
## [1] "you have p: 6"
##
## observed stats: 17 25 40 43 59 11
## number of covariates: 4
## A thinning of 10 ,
## (parameter) burn-in of 100 , and
## iterations of 5100 ,
## yields a sample size of 501 .
##
## you have done 510 iterations out of 5100
## theta: -3.1 0.343 -0.665 0.355 1.131 -1.491
## you have done 1020 iterations out of 5100
## theta: -1.043 0.417 -0.43 -0.26 0.933 -2.581
## you have done 1530 iterations out of 5100
## theta: -1.769 0.718 -1.303 0.342 1.043 -1.713
## you have done 2040 iterations out of 5100
## theta: -3.569 0.11 -1.452 0.776 2.123 -2.971
## you have done 2550 iterations out of 5100
## theta: -4.135 0.584 -0.06 0.179 0.603 0.061
## you have done 3060 iterations out of 5100
## theta: -5.852 0.386 -0.518 0.204 1.409 1.009
## you have done 3570 iterations out of 5100
## theta: -2.482 0.761 -0.711 0.503 0.759 -1.937
## you have done 4080 iterations out of 5100
## theta: -4.629 0.245 -1.041 1.383 1.153 -0.677
## you have done 4590 iterations out of 5100
## theta: -6.768 0.664 -0.978 0.613 1.932 -0.953
## you have done 5100 iterations out of 5100
## theta: -3.612 0.926 -1.095 0.58 0.831 -0.101
## summaries of the posterior draws:
## mean sd ESS SACF 10 SACF 30
## intercept -3.6637 1.4545 61.0515 0.1076
## contagion 0.5890 0.3228 45.3047 0.2180
## idegree -0.7629 0.4072 43.1752 0.0483
## odegree 0.4498 0.3832 46.6506 0.2298
## alcohol 1.0706 0.4167 70.8577 0.1980
## sport -0.9663 0.9692 25.2831 0.2912
##
## if ESS is small you need to increase the number of Iterations
## if and/or increase the parameter thinning which is currently 10 .
## Alternatively, adjust the proposals by improving the proposal variance-covariance:
## e.g. set PropSigma equal to the covariance of the posterior draws, or
## increase/decrease the scaling of the proposals (default: scaling=1/sqrt( 6 ); current: 0.55 /sqrt( 6 )
## Acceptance ratio: 0.04588235 (idealy around 0.25)
plot( ts( res.DC.2$Thetas ))
Chains seem to make big moves but very few proposals are accepted.
Possible remedies include 1. Reduce scaling 2. Buy mixing by increasing iterations and thinning 3. Recalibrate yet another time
That the chains are not trending and that autocorrelation is not too high more imporant that a single number. Conceptually, if you have an ESS of, say, 10, then you will only have precision down to the first decimal. With ESS of 1000, you can have precision down to maybe the second or third decimal.
When determining whether to accept \(\boldsymbol{\theta}^{\ast}\), a replicate dataset \(\boldsymbol{y}^{\ast}\), is drawn from the model \[ \boldsymbol{y}^{\ast} \thicksim p_{\boldsymbol{\theta}^{\ast}}(\boldsymbol{y} | \mathbf{X} ) \] The parameter \(\boldsymbol{\theta}^{\ast}\) is accepted into the posterior is \(\boldsymbol{y}^{\ast}\) sufficiently similar to observed data \(\boldsymbol{y}\). More specifically, the parameter is accepted with probability \[ \min \left\{1,e^h \right\} \] where \[ h = (\boldsymbol{\theta}_t-\boldsymbol{\theta}^{\ast})^{\top}(\boldsymbol{z}(\boldsymbol{y}^{\ast},\mathbf{X})-\boldsymbol{z}(\boldsymbol{y},\mathbf{X})) \]
We cannot draw \(\boldsymbol{y}^{\ast}\) directly from the
model, but will have to rely on MCMC. This algorithm is similar to how
we generate \(\boldsymbol{\theta}\)
iteratively, a sequence \[
\boldsymbol{y}^{\ast}_0,\boldsymbol{y}^{\ast}_1,\ldots,\boldsymbol{y}^{\ast}_M
\] is generated and only the last iteration is used. The number
\(M\) of iterations we use in order to
get one draw, is set by the argument burnin. This number
should be at least \[
M > 0.25 \times n \times \kappa
\]
for \(\kappa\) greater than 30.
To appraise how well the estimate model replicates data, the goodness-of-fit (GOF) procedure simulates replicate data \[ \boldsymbol{y}_t^{(rep)}\thicksim p_{\boldsymbol{\theta}_t}(\boldsymbol{y}^{(rep)} | \mathbf{X} ) \] for the each of the parameter dras \(\boldsymbol{\theta}_t\) in our posterior sample. Fit of the replicate data to observed data is then judged by comparing \(\boldsymbol{y}_t^{(rep)}\) to \(\boldsymbol{y}\) on a number of metrics \(S_k:\mathcal{Y}\times\mathcal{X} \rightarrow \mathbb{R}\). The posterior \(p\)-value is defined as \[ \mathbb{E}[|S_k(\boldsymbol{y}^{(rep)})-\mathbb{E}(S_k(\boldsymbol{y}^{(rep)}))|>|S_k(\boldsymbol{y})-\mathbb{E}(S_k(\boldsymbol{y}^{(rep)}))|] \] ### GOF Statistics
The pre-programmed statistics \(S_k\) are
| GOF-name | interpretation | statistic |
|---|---|---|
| intercept | intercept | \(\sum y_{i}\) |
| simple cont. | direct contagion through outgoing ties | \(\sum y_{i}y_{j}x_{i,j}\) |
| recip cont. | contagion through reciprochated ties | \(\sum y_{i}y_{j}x_{i,j}x_{j,i}\) |
| indirect cont. | indirect contagion | \(\sum_{j,k}y_ix_{i,j}x_{j,k}y_k\) |
| closedind cont. | contaigion in closed triad | \(\sum_{j,k}y_ix_{i,j}x_{j,k}x_{i,k}y_k\) |
| transitive cont. | contagion in transitive triple | \(\sum_{j,k}x_{i,j}x_{j,k}x_{i,k}y_iy_jy_k\) |
| outdegree | Markov outdegree | \(\sum y_{i}\sum_j x_{i,j}\) |
| indegree | Markov outdegree | \(\sum y_{i}\sum_j x_{j,i}\) |
| reciprochation | Markov reciprochal ties | \(\sum y_{i}\sum_j x_{i,j}x_{i,j}\) |
| instar | Markov in-star | \(\sum y_{i} {\binom{\sum_j x_{i,j}}{2}}\) |
| outstar | Markov out-star | \(\sum y_{i} {\binom{\sum_j x_{j,i}}{2}}\) |
| twopath | Markov mixed star | \(\sum y_{i} \sum x_{i,j}x_{i,k}\) |
| in3star | Markov in-three star | \(\sum y_{i} \sum x_{j,i}x_{k,i}x_{h,i}\) |
| out3star | Markov out-three star | \(\sum y_{i} \sum x_{i,j}x_{i,k}x_{i,h}\) |
| transitive | Markov transitive triangle | \(\sum y_i \sum_{j,k}x_{i,j}x_{j,k}x_{i,k}\) |
| cyclic | Markov cyclic triangle | \(\sum y_i \sum_{j,k}x_{i,j}x_{j,k}x_{k,i}\) |
| indirect | Markov indirect, non-exclusive ties | \(\sum_{j} (x_{i,j} x_{j, +} - x_{i,j}x_{j,i})\) |
| excl.indirect | Markov indirect, unique nodes | \(\sharp \{ k : x_{ik}=0,\max_j(x_{i,j}x_{j,k})>0 \}\) |
| prop.alc.alter | a user-defined alter attribute variable | \(\frac{1}{1+x_{i,+}} \sum x_{i,j}a_{j}\) |
To generate a sample from the model implied by the independent model
sim.0 <- get.gof.distribution(NumIterations=100, # number of vectors to draw
res=res.0, # the ALAAM estimation object that contains model and results
burnin=100, # no. iteractions discarded from GOF distribution
contagion ='none') # should be the same as for model fitted
## Simulating GOF took 0.9588392
## Calculating statistics took 0.118017
gof.table(obs.stats= sim.0$stats, # observed statistics included not fitted statistics
sim.stats= sim.0$Sav.gof, # simulated goodness-of-fit statistics
name.vec= sim.0$gof.stats.names, # names of statistics calculate, not all will be used if undirected
tabname='ALAAMGofalt', # name of file saved
pvalues=TRUE, # posterior predictive p-values
save.tab ='csv' # save a csv file or a LaTex file
)
## obs mean p-val
## intercept 17.000 16.820 0.200
## simple cont. 25.000 20.200 0.130
## recip cont. 7.000 6.550 0.180
## indirect cont. 59.000 38.450 0.055
## closedind cont. 22.000 18.720 0.160
## transitive cont. 13.000 11.450 0.205
## outdegree 43.000 42.830 0.220
## indegree 40.000 46.150 0.115
## reciprochation 26.000 29.280 0.120
## instar 57.000 68.850 0.115
## outstar 44.000 42.720 0.195
## twopath 88.000 101.900 0.125
## in3star 71.000 76.630 0.105
## out3star 22.000 20.940 0.190
## transitive 34.000 34.550 0.220
## cyclic 18.000 25.290 0.035
## indirect 93.000 82.700 0.135
## excl.indirect 53.000 43.860 0.110
The independent model struggles to replicate the contagion effects
Repeat for the simple contagion model
sim.2 <- get.gof.distribution(NumIterations=100, # number of vectors to draw
res=res.DC.2, # the ALAAM estimation object that contains model and results
thinning= 5000,# number of iterations to draw y
burnin=100, # no. iteractions discarded from GOF distribution
contagion ='simple') # should be the same as for model fitted
## Simulating GOF took 4.995309
## Calculating statistics took 0.04985404
gof.table(obs.stats= sim.2$stats, # observed statistics included not fitted statistics
sim.stats= sim.2$Sav.gof, # simulated goodness-of-fit statistics
name.vec= sim.2$gof.stats.names, # names of statistics calculate, not all will be used if undirected
tabname='ALAAMGofalt', # name of file saved
pvalues=TRUE, # posterior predictive p-values
save.tab ='csv', # save a csv file or a LaTex file
directed=FALSE)
## obs mean p-val
## intercept 17.000 16.080 0.215
## simple cont. 25.000 19.780 0.115
## recip cont. 7.000 6.960 0.205
## indirect cont. 59.000 33.520 0.045
## closedind cont. 22.000 16.230 0.105
## transitive cont. 13.000 10.330 0.165
## degree 43.000 40.830 0.220
## reciprochation 26.000 25.840 0.230
## instar 57.000 39.780 0.085
## outstar 44.000 42.430 0.255
## twopath 88.000 76.910 0.170
## in3star 71.000 30.920 0.065
## out3star 22.000 23.400 0.200
## triadic 34.000 35.030 0.220
## cyclic 18.000 20.710 0.150
## indirect 93.000 83.680 0.205
## excl.indirect 53.000 44.170 0.135
No indications of poor fit
GOF only checks if there are features of data that are not adequately captured by the model. We cannot say how much better or worse the GOF for one model is compared to another - either the model fits or it does not (and we do not want to overfit).
To compare models we may quantify the fit of a model, drawing by analogy with standard generalised linear models.